How To Tell Head From Tail in User-Generated Content Corpora

نویسنده

  • Nishanth R. Sastry
چکیده

This paper asks whether unpopular tail items in usergenerated content corpora are important, and how tail items differ from the popular head items. We develop a user-centric characterisation of the tail which shows that although the head receives a disproportionate share of interest, tail items collectively serve a large number of users. “Tail seekers”, with more ‘like’s in the tail than the head, are shown to constitute more than half the user base. We then examine how interests in head and tail items differ. Temporally, head items are found to enjoy a sustained interest, whereas interest in tail items is short lived. Spatially, interest in tail items is more geographically diverse. Finally, from a social angle, interest in unpopular items appears to be more “viral” than non-viral. We discuss implications of these observations for the handling and distribution of user-generated content.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

O-10: Formation and Molecular Composition of The Sperm Head to Tail Coupling Apparatus

Background According to a worldwide survey in 2010 infertility affects 48.5 million of couples. In roughly half of the cases infertility is provoked by the male mate. Thus, a significant percentage of young men are infertile but the underlying causes are mostly unknown. Male fertility and reproduction success critically depends on proper formation of the mature sperm. Transmission of the geneti...

متن کامل

Extracting Resources that Help Tell Events' Stories

Social media platforms constitute a valuable source of information regarding real-world happenings. In particular, user generated content on mobile-oriented platforms like Twitter allows for real-time narrations thanks to the instantaneous nature of publishing. A common practice for users is to include in the tweets links pointing to articles, media files and other resources. In this paper, we ...

متن کامل

Automatic Entity Recognition and Typing in Massive Text Corpora

In today’s computerized and information-based society, we are soaked with vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationsh...

متن کامل

A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content

In order to extract meaningful phrases from corpora (e. g. in an information retrieval context) intensive knowledge of the domain in question and the respective documents is generally needed. When moving to a new domain or language the underlying knowledge bases and models need to be adapted, which is often time-consuming and labor-intensive. This paper adresses the described challenge of phras...

متن کامل

Unsupervised Detection of Argumentative Units though Topic Modeling Techniques

In this paper we present a new unsupervised approach, “Attraction to Topics” – A2T , for the detection of argumentative units, a sub-task of argument mining. Motivated by the importance of topic identification in manual annotation, we examine whether topic modeling can be used for performing unsupervised detection of argumentative sentences, and to what extend topic modeling can be used to clas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012